Perceiver: General Perception with Iterative Attention
In this paper we introduce the Perceiver - a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets.
images, point clouds, audio, video, and video+audio
マルチモーダルの研究(画像と音声)